Pretraining XLNet using the original implementation¶

XLNet is a powerful language model that outperformed BERT on 20+ NLP tasks.

The original implementation requires tensorflow 1.13.1, which is not pre-installed in colab

1. Installation of tensorflow 1.13.1 and python 2.7¶

Running import tensorflow will import the default version (currently 2.x). You can use 1.x by running a cell with the tensorflow_version magic before you run import tensorflow.

In [2]:
%tensorflow_version 1.x

Once you have specified a version via this magic, you can run import tensorflow as normal and verify which version was imported as follows:

In [3]:
import tensorflow
print(tensorflow.__version__)
1.15.2

However, version 1.13.1 is not available

In [1]:
%tensorflow_version 1.13.1
`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `1.13.1`. This will be interpreted as: `1.x`.


TensorFlow 1.x selected.
In [2]:
import tensorflow
print(tensorflow.__version__)
1.15.2

In this case, we can install a specific tensorflow in an conda environment.

For example, if I want to install tensorflow 1.13.1 under python 2.7, I can run the following commands in the terminal (not in the notebook cells)

cd /root
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
sh Miniconda3-latest-Linux-x86_64.sh
source .bashrc 
conda create -n tf113py2 python=2.7 
conda activate tf113py2
pip install sentencepiece==0.1.5
conda install tensorflow-gpu=1.13.1

You can back up the conda folder to google drive or anywhere else.

tar cvf miniconda3.tar miniconda3/

if you want to back up to google drive, mount the google drive:

In [3]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [9]:
! mkdir /content/drive/MyDrive/colab_data/conda_backup
mkdir: cannot create directory ‘/content/drive/MyDrive/colab_data/conda_backup’: File exists
In [7]:
! cp miniconda3.tar.gz  /content/drive/MyDrive/colab_data/conda_backup
! cp .bashrc /content/drive/MyDrive/colab_data/conda_backup/colab_bashrc
miniconda3.tar.gz
In [10]:
! ls /content/drive/MyDrive/colab_data/conda_backup/
colab_bashrc  miniconda3.tar.gz

2. Pretraining XLNet¶

2.1 connect to google drive¶

In [1]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive
In [2]:
ls /content/drive/MyDrive/colab_data/conda_backup/
colab_bashrc  miniconda3.tar.gz

2.2 copy the premade conda tarball to the home directory (/root)¶

In [4]:
import os
os.chdir('/root')
import sys
print(sys.version)
3.7.10 (default, Feb 20 2021, 21:17:23) 
[GCC 7.5.0]
In [5]:
! cp /content/drive/MyDrive/colab_data/conda_backup/miniconda3.tar.gz /root/ 
! tar xaf miniconda3.tar.gz
In [6]:
! cp /content/drive/MyDrive/colab_data/conda_backup/colab_bashrc /root/ 

2.3 activate the conda env in terminal (not in the notebook cell)¶

conda activate tf113py2

Please note that the tensorflow vesion in the terminal is differnt from that in notebook

2.4 test if TPU is ready¶

In [7]:
import tensorflow as tf
print("Tensorflow version " + tf.__version__)

try:
  tpu = tf.distribute.cluster_resolver.TPUClusterResolver()  # TPU detection
  print('Running on TPU ', tpu.cluster_spec().as_dict()['worker'])
except ValueError:
  raise BaseException('ERROR: Not connected to a TPU runtime; please see the previous cell in this notebook for instructions!')

tf.config.experimental_connect_to_cluster(tpu)
tf.tpu.experimental.initialize_tpu_system(tpu)
tpu_strategy = tf.distribute.experimental.TPUStrategy(tpu)
Tensorflow version 2.4.1
Running on TPU  ['10.97.191.18:8470']
INFO:tensorflow:Initializing the TPU system: grpc://10.97.191.18:8470
INFO:tensorflow:Initializing the TPU system: grpc://10.97.191.18:8470
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Clearing out eager caches
INFO:tensorflow:Finished initializing TPU system.
INFO:tensorflow:Finished initializing TPU system.
WARNING:absl:`tf.distribute.experimental.TPUStrategy` is deprecated, please use  the non experimental symbol `tf.distribute.TPUStrategy` instead.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 0, 0)

2.5 connect to google cloud storage¶

Google cloud storage is required for running XLNet using TPU. The output files will be written to the google cloud storage bucket.

In [8]:
from google.colab import auth
auth.authenticate_user()
In [9]:
! gcloud init
Welcome! This command will take you through the configuration of gcloud.

Settings from your current configuration [default] are:
component_manager:
  disable_update_check: 'True'
compute:
  gce_metadata_read_timeout_sec: '0'
core:
  account: fangli2718@gmail.com

Pick configuration to use:
 [1] Re-initialize this configuration [default] with new settings 
 [2] Create a new configuration
Please enter your numeric choice:  2

Enter configuration name. Names start with a lower case letter and 
contain only lower case letters a-z, digits 0-9, and hyphens '-':  fangli3
Your current configuration has been set to: [fangli3]

You can skip diagnostics next time by using the following flag:
  gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).

Choose the account you would like to use to perform operations for 
this configuration:
 [1] fangli2718@gmail.com
 [2] Log in with a new account
Please enter your numeric choice:  1

You are logged in as: [fangli2718@gmail.com].

Pick cloud project to use: 
 [1] xlnet3
 [2] Create a new project
Please enter numeric choice or text value (must exactly match list 
item):  1

Your current project has been set to: [xlnet3].

Do you want to configure a default Compute Region and Zone? (Y/n)?  y

Which Google Compute Engine zone would you like to use as project 
default?
If you do not specify a zone via a command line flag while working 
with Compute Engine resources, the default is assumed.
 [1] us-east1-b
 [2] us-east1-c
 [3] us-east1-d
 [4] us-east4-c
 [5] us-east4-b
 [6] us-east4-a
 [7] us-central1-c
 [8] us-central1-a
 [9] us-central1-f
 [10] us-central1-b
 [11] us-west1-b
 [12] us-west1-c
 [13] us-west1-a
 [14] europe-west4-a
 [15] europe-west4-b
 [16] europe-west4-c
 [17] europe-west1-b
 [18] europe-west1-d
 [19] europe-west1-c
 [20] europe-west3-c
 [21] europe-west3-a
 [22] europe-west3-b
 [23] europe-west2-c
 [24] europe-west2-b
 [25] europe-west2-a
 [26] asia-east1-b
 [27] asia-east1-a
 [28] asia-east1-c
 [29] asia-southeast1-b
 [30] asia-southeast1-a
 [31] asia-southeast1-c
 [32] asia-northeast1-b
 [33] asia-northeast1-c
 [34] asia-northeast1-a
 [35] asia-south1-c
 [36] asia-south1-b
 [37] asia-south1-a
 [38] australia-southeast1-b
 [39] australia-southeast1-c
 [40] australia-southeast1-a
 [41] southamerica-east1-b
 [42] southamerica-east1-c
 [43] southamerica-east1-a
 [44] asia-east2-a
 [45] asia-east2-b
 [46] asia-east2-c
 [47] asia-northeast2-a
 [48] asia-northeast2-b
 [49] asia-northeast2-c
 [50] asia-northeast3-a
Did not print [24] options.
Too many options [74]. Enter "list" at prompt to print choices fully.
Please enter numeric choice or text value (must exactly match list 
item):  9

Your project default Compute Engine zone has been set to [us-central1-f].
You can change it by running [gcloud config set compute/zone NAME].

Your project default Compute Engine region has been set to [us-central1].
You can change it by running [gcloud config set compute/region NAME].

Your Google Cloud SDK is configured and ready to use!

* Commands that require authentication will use fangli2718@gmail.com by default
* Commands will reference project `xlnet3` by default
* Compute Engine commands will use region `us-central1` by default
* Compute Engine commands will use zone `us-central1-f` by default

Run `gcloud help config` to learn how to change individual settings

This gcloud configuration is called [fangli3]. You can create additional configurations if you work with multiple accounts and/or projects.
Run `gcloud topic configurations` to learn more.

Some things to try next:

* Run `gcloud --help` to see the Cloud Platform services you can interact with. And run `gcloud help COMMAND` to get help on any gcloud command.
* Run `gcloud topic --help` to learn about advanced features of the SDK like arg files and output formatting

check if the bucked is connected:

In [12]:
! gsutil ls gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/
gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/spiece.model
gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json
gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt.data-00000-of-00001
gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt.index
gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt.meta

copy the key file to home

In [18]:
! gsutil cp gs://fangli3/bioxlnet/gs_key_fangli3dl4.json /root/
Copying gs://fangli3/bioxlnet/gs_key_fangli3dl4.json...
/ [1 files][  2.2 KiB/  2.2 KiB]                                                
Operation completed over 1 objects/2.2 KiB.                                      

check if tensorflow can access the bucket

In [21]:
import tensorflow as tf
tf.io.gfile.exists('gs://fangli3/bioxlnet/v7_base_tf/step96k/model.ckpt.index')
Out[21]:
True

set up environment variable in terminal (not in notebook cell)

export GOOGLE_APPLICATION_CREDENTIALS=/root/gs_key_fangli3dl4.json

check if tensorflow 1.13.1 can access the bucket

In [27]:
cmd = "import tensorflow as tf\n"
cmd += "print(tf.gfile.Exists('gs://fangli3/'))\n"
f = open('check_gcs.py', 'w')
f.write(cmd + '\n')
f.close()

run this command in terminal: python check_gcs.py It should print True

2.6 download source code of XLNet¶

I have a modified version which fixed some bugs

In [17]:
os.chdir('/root')
In [16]:
! wget http://144.34.239.101/c336d0014288370fa581fee196bbec3e/xlnet-modified.tar.gz && tar xaf xlnet-modified.tar.gz
--2021-03-03 18:13:07--  http://144.34.239.101/c336d0014288370fa581fee196bbec3e/xlnet-modified.tar.gz
Connecting to 144.34.239.101:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6192525 (5.9M) [application/x-gzip]
Saving to: ‘xlnet-modified.tar.gz’

xlnet-modified.tar. 100%[===================>]   5.91M  14.0MB/s    in 0.4s    

2021-03-03 18:13:08 (14.0 MB/s) - ‘xlnet-modified.tar.gz’ saved [6192525/6192525]

2.7 run XLNet¶

In [28]:
cmd = '''/root/miniconda3/envs/tf113py2/bin/python /root/xlnet/train.py \
  --init_checkpoint=gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
  --alsologtostderr \
  --log_dir=/root/bioxlnet/ \
  --num_passes=1 \
  --train_steps=1000000 \
  --learning_rate=3.33e-6 \
  --save_steps=5000 \
  --iterations=1000 \
  --nouncased \
  --num_core_per_host=8 \
  --record_info_dir=gs://fangli3/bioxlnet/data/256_tokens_ascii_1M_lines_part002/bsz_per_host16.num_core_per_host8.seq_len256.reuse_len128/tfrecords/ \
  --train_batch_size=16 \
  --seq_len=256 \
  --reuse_len=128 \
  --mem_len=192 \
  --perm_size=128 \
  --n_layer=24 \
  --d_model=1024 \
  --d_embed=1024 \
  --n_head=16  \
  --d_head=64 \
  --d_inner=4096  \
  --model_dir=gs://fangli3/bioxlnet/v6_large_tf/model/  \
  --untie_r=True \
  --mask_alpha=6 \
  --mask_beta=1 \
  --num_predict=85
'''

sh_file = 'run_train_large.sh'
sh_f = open(sh_file, 'w')
sh_f.write(cmd)
sh_f.close()
! cat run_train_large.sh
/root/miniconda3/envs/tf113py2/bin/python /root/xlnet/train.py   --init_checkpoint=gs://fangli3/bioxlnet/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt   --alsologtostderr   --log_dir=/root/bioxlnet/   --num_passes=1   --train_steps=1000000   --learning_rate=3.33e-6   --save_steps=5000   --iterations=1000   --nouncased   --num_core_per_host=8   --record_info_dir=gs://fangli3/bioxlnet/data/256_tokens_ascii_1M_lines_part002/bsz_per_host16.num_core_per_host8.seq_len256.reuse_len128/tfrecords/   --train_batch_size=16   --seq_len=256   --reuse_len=128   --mem_len=192   --perm_size=128   --n_layer=24   --d_model=1024   --d_embed=1024   --n_head=16    --d_head=64   --d_inner=4096    --model_dir=gs://fangli3/bioxlnet/v6_large_tf/model/    --untie_r=True   --mask_alpha=6   --mask_beta=1   --num_predict=85

run this command in terminal, not in the notebook:

sh run_train_large.sh &> run_train_large.sh.log

2.8 monitor the run¶

In [31]:
! tail -f run_train_large.sh.log
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/gamma/Adam_1:0, shape = (1024,)
I0303 18:27:26.640438 139698700453760 model_utils.py:91]   name = model/transformer/layer_23/ff/LayerNorm/gamma/Adam_1:0, shape = (1024,)
INFO:tensorflow:  name = model/lm_loss/bias/Adam:0, shape = (32000,)
I0303 18:27:26.640531 139698700453760 model_utils.py:91]   name = model/lm_loss/bias/Adam:0, shape = (32000,)
INFO:tensorflow:  name = model/lm_loss/bias/Adam_1:0, shape = (32000,)
I0303 18:27:26.640599 139698700453760 model_utils.py:91]   name = model/lm_loss/bias/Adam_1:0, shape = (32000,)
INFO:tensorflow:Create CheckpointSaverHook.
I0303 18:27:29.580889 139698700453760 basic_session_run_hooks.py:527] Create CheckpointSaverHook.
INFO:tensorflow:Done calling model_fn.
I0303 18:27:30.186203 139698700453760 estimator.py:1113] Done calling model_fn.
INFO:tensorflow:TPU job name tpu_worker
I0303 18:27:36.146019 139698700453760 tpu_estimator.py:447] TPU job name tpu_worker
INFO:tensorflow:Graph was finalized.
I0303 18:27:38.824338 139698700453760 monitored_session.py:222] Graph was finalized.
WARNING:tensorflow:From /root/miniconda3/envs/tf113py2/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
W0303 18:27:38.915483 139698700453760 deprecation.py:323] From /root/miniconda3/envs/tf113py2/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from gs://fangli3/bioxlnet/v6_large_tf/model/model.ckpt-570000
I0303 18:27:39.021770 139698700453760 saver.py:1270] Restoring parameters from gs://fangli3/bioxlnet/v6_large_tf/model/model.ckpt-570000
^C

Pretraining XLNet using the transformers library (non-official version)¶

This library is developped by huggingface (https://huggingface.co/)

1. install the transformers library¶

In [32]:
! pip install transformers
! pip install datasets
! pip install seqeval
Collecting transformers
  Downloading https://files.pythonhosted.org/packages/f9/54/5ca07ec9569d2f232f3166de5457b63943882f7950ddfcc887732fc7fb23/transformers-4.3.3-py3-none-any.whl (1.9MB)
     |████████████████████████████████| 1.9MB 6.4MB/s 
Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)
Collecting sacremoses
  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
     |████████████████████████████████| 890kB 23.8MB/s 
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.0.12)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers) (20.9)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.19.5)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from transformers) (3.7.0)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20)
Collecting tokenizers<0.11,>=0.10.1
  Downloading https://files.pythonhosted.org/packages/71/23/2ddc317b2121117bf34dd00f5b0de194158f2a44ee2bf5e47c7166878a97/tokenizers-0.10.1-cp37-cp37m-manylinux2010_x86_64.whl (3.2MB)
     |████████████████████████████████| 3.2MB 32.3MB/s 
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.41.1)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2020.12.5)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.0.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers) (2.4.7)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.7.4.3)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.4.0)
Building wheels for collected packages: sacremoses
  Building wheel for sacremoses (setup.py) ... done
  Created wheel for sacremoses: filename=sacremoses-0.0.43-cp37-none-any.whl size=893262 sha256=8b26b029ab018abab9c1d851e6e65677be60a25825fc7c26ffce4d8f99d2479d
  Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45
Successfully built sacremoses
Installing collected packages: sacremoses, tokenizers, transformers
Successfully installed sacremoses-0.0.43 tokenizers-0.10.1 transformers-4.3.3
Collecting datasets
  Downloading https://files.pythonhosted.org/packages/91/8e/68011343a74dfb7bb2e59ea10b3191c7d55b43c8239356875609a56d7c71/datasets-1.4.0-py3-none-any.whl (186kB)
     |████████████████████████████████| 194kB 5.1MB/s 
Collecting huggingface-hub==0.0.2
  Downloading https://files.pythonhosted.org/packages/b5/93/7cb0755c62c36cdadc70c79a95681df685b52cbaf76c724facb6ecac3272/huggingface_hub-0.0.2-py3-none-any.whl
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from datasets) (3.7.0)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.7/dist-packages (from datasets) (0.70.11.1)
Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from datasets) (1.1.5)
Collecting fsspec
  Downloading https://files.pythonhosted.org/packages/91/0d/a6bfee0ddf47b254286b9bd574e6f50978c69897647ae15b14230711806e/fsspec-0.8.7-py3-none-any.whl (103kB)
     |████████████████████████████████| 112kB 9.2MB/s 
Requirement already satisfied: pyarrow>=0.17.1 in /usr/local/lib/python3.7/dist-packages (from datasets) (3.0.0)
Requirement already satisfied: dill in /usr/local/lib/python3.7/dist-packages (from datasets) (0.3.3)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from datasets) (1.19.5)
Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.7/dist-packages (from datasets) (2.23.0)
Collecting xxhash
  Downloading https://files.pythonhosted.org/packages/e7/27/1c0b37c53a7852f1c190ba5039404d27b3ae96a55f48203a74259f8213c9/xxhash-2.0.0-cp37-cp37m-manylinux2010_x86_64.whl (243kB)
     |████████████████████████████████| 245kB 9.3MB/s 
Requirement already satisfied: tqdm<4.50.0,>=4.27 in /usr/local/lib/python3.7/dist-packages (from datasets) (4.41.1)
Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from huggingface-hub==0.0.2->datasets) (3.0.12)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets) (3.4.0)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->datasets) (3.7.4.3)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2018.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->datasets) (2.8.1)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (1.24.3)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.19.0->datasets) (2020.12.5)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.15.0)
Installing collected packages: huggingface-hub, fsspec, xxhash, datasets
Successfully installed datasets-1.4.0 fsspec-0.8.7 huggingface-hub-0.0.2 xxhash-2.0.0
Collecting seqeval
  Downloading https://files.pythonhosted.org/packages/9d/2d/233c79d5b4e5ab1dbf111242299153f3caddddbb691219f363ad55ce783d/seqeval-1.2.2.tar.gz (43kB)
     |████████████████████████████████| 51kB 342kB/s 
Requirement already satisfied: numpy>=1.14.0 in /usr/local/lib/python3.7/dist-packages (from seqeval) (1.19.5)
Requirement already satisfied: scikit-learn>=0.21.3 in /usr/local/lib/python3.7/dist-packages (from seqeval) (0.22.2.post1)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval) (1.0.1)
Requirement already satisfied: scipy>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn>=0.21.3->seqeval) (1.4.1)
Building wheels for collected packages: seqeval
  Building wheel for seqeval (setup.py) ... done
  Created wheel for seqeval: filename=seqeval-1.2.2-cp37-none-any.whl size=16172 sha256=7f6f6c22e15ff18650d3e171e3978bb2fe587ca7610409dad470daa44e275709
  Stored in directory: /root/.cache/pip/wheels/52/df/1b/45d75646c37428f7e626214704a0e35bd3cfc32eda37e59e5f
Successfully built seqeval
Installing collected packages: seqeval
Successfully installed seqeval-1.2.2

2. install PyTorch XLA¶

XLA: Accelerated Linear Algebra

In [33]:
! pip install cloud-tpu-client==0.10 https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp37-cp37m-linux_x86_64.whl
Collecting cloud-tpu-client==0.10
  Downloading https://files.pythonhosted.org/packages/56/9f/7b1958c2886db06feb5de5b2c191096f9e619914b6c31fdf93999fdbbd8b/cloud_tpu_client-0.10-py3-none-any.whl
Collecting torch-xla==1.7
  Downloading https://storage.googleapis.com/tpu-pytorch/wheels/torch_xla-1.7-cp37-cp37m-linux_x86_64.whl (133.6MB)
     |████████████████████████████████| 133.6MB 73kB/s 
Collecting google-api-python-client==1.8.0
  Downloading https://files.pythonhosted.org/packages/9a/b4/a955f393b838bc47cbb6ae4643b9d0f90333d3b4db4dc1e819f36aad18cc/google_api_python_client-1.8.0-py3-none-any.whl (57kB)
     |████████████████████████████████| 61kB 2.1MB/s 
Requirement already satisfied: oauth2client in /usr/local/lib/python3.7/dist-packages (from cloud-tpu-client==0.10) (4.1.3)
Requirement already satisfied: google-auth-httplib2>=0.0.3 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (0.0.4)
Requirement already satisfied: six<2dev,>=1.6.1 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.15.0)
Requirement already satisfied: google-api-core<2dev,>=1.13.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.16.0)
Requirement already satisfied: httplib2<1dev,>=0.9.2 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (0.17.4)
Requirement already satisfied: uritemplate<4dev,>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (3.0.1)
Requirement already satisfied: google-auth>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.27.0)
Requirement already satisfied: rsa>=3.1.4 in /usr/local/lib/python3.7/dist-packages (from oauth2client->cloud-tpu-client==0.10) (4.7.2)
Requirement already satisfied: pyasn1-modules>=0.0.5 in /usr/local/lib/python3.7/dist-packages (from oauth2client->cloud-tpu-client==0.10) (0.2.8)
Requirement already satisfied: pyasn1>=0.1.7 in /usr/local/lib/python3.7/dist-packages (from oauth2client->cloud-tpu-client==0.10) (0.4.8)
Requirement already satisfied: pytz in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2018.9)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.6.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.52.0)
Requirement already satisfied: setuptools>=34.0.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (54.0.0)
Requirement already satisfied: protobuf>=3.4.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (3.12.4)
Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /usr/local/lib/python3.7/dist-packages (from google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2.23.0)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from google-auth>=1.4.1->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (4.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2020.12.5)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests<3.0.0dev,>=2.18.0->google-api-core<2dev,>=1.13.0->google-api-python-client==1.8.0->cloud-tpu-client==0.10) (2.10)
Installing collected packages: google-api-python-client, cloud-tpu-client, torch-xla
  Found existing installation: google-api-python-client 1.7.12
    Uninstalling google-api-python-client-1.7.12:
      Successfully uninstalled google-api-python-client-1.7.12
Successfully installed cloud-tpu-client-0.10 google-api-python-client-1.8.0 torch-xla-1.7
In [2]:
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [3]:
import os
os.chdir('/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet')
In [4]:
train_file = '/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt'
test_file = train_file
xla_spawn = '/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py'
In [5]:
cmd = '''
TRAIN_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
TEST_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
N_CPU=1
XLA_SPAWN=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py
RUN_PLM=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/language-modeling/run_plm.py
WORK_DIR=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/v1_base/
MODEL_DIR=${WORK_DIR}/huggingface_model/
NUM_EPOCHS=10
MODEL_NAME_OR_PATH=xlnet-base-cased
MAX_LENGTH=512
GRAD_ACCU_STEPS=2
PER_DEVICE_BATCH_SIZE=2
SEED=11
LOG_STEP=1
SAVE_STEP=100
LR=5e-5
WARM_UP_STEP=10
RUN_NAME=v1_base_huggingface
mkdir -p ${MODEL_DIR}

/usr/local/bin/python ${XLA_SPAWN} \
  --num_cores 8 \
  ${RUN_PLM} \
  --preprocessing_num_workers ${N_CPU} \
  --run_name ${RUN_NAME} \
  --dataloader_num_workers  ${N_CPU} \
  --model_name_or_path ${MODEL_NAME_OR_PATH} \
  --max_seq_length ${MAX_LENGTH} \
  --per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE} \
  --per_device_eval_batch_size ${PER_DEVICE_BATCH_SIZE} \
  --gradient_accumulation_steps ${GRAD_ACCU_STEPS} \
  --num_train_epochs ${NUM_EPOCHS} \
  --train_file ${TRAIN_FILE} \
  --validation_file ${TEST_FILE} \
  --do_train \
  --do_eval \
  --output_dir ${MODEL_DIR} \
  --overwrite_output_dir \
  --save_steps ${SAVE_STEP} \
  --seed ${SEED} \
  --logging_first_step \
  --logging_steps $LOG_STEP  \
  --learning_rate ${LR} \
  --warmup_steps ${WARM_UP_STEP} \
  --pad_to_max_length
'''

out_f = open('run_xla_train.sh', 'w')
out_f.write(cmd)
out_f.close()
! cat run_xla_train.sh
TRAIN_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
TEST_FILE=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/data/tiny.txt
N_CPU=1
XLA_SPAWN=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py
RUN_PLM=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/language-modeling/run_plm.py
WORK_DIR=/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/v1_base/
MODEL_DIR=${WORK_DIR}/huggingface_model/
NUM_EPOCHS=10
MODEL_NAME_OR_PATH=xlnet-base-cased
MAX_LENGTH=512
GRAD_ACCU_STEPS=2
PER_DEVICE_BATCH_SIZE=2
SEED=11
LOG_STEP=1
SAVE_STEP=100
LR=5e-5
WARM_UP_STEP=10
RUN_NAME=v1_base_huggingface
mkdir -p ${MODEL_DIR}

/usr/local/bin/python ${XLA_SPAWN}   --num_cores 8   ${RUN_PLM}   --preprocessing_num_workers ${N_CPU}   --run_name ${RUN_NAME}   --dataloader_num_workers  ${N_CPU}   --model_name_or_path ${MODEL_NAME_OR_PATH}   --max_seq_length ${MAX_LENGTH}   --per_device_train_batch_size ${PER_DEVICE_BATCH_SIZE}   --per_device_eval_batch_size ${PER_DEVICE_BATCH_SIZE}   --gradient_accumulation_steps ${GRAD_ACCU_STEPS}   --num_train_epochs ${NUM_EPOCHS}   --train_file ${TRAIN_FILE}   --validation_file ${TEST_FILE}   --do_train   --do_eval   --output_dir ${MODEL_DIR}   --overwrite_output_dir   --save_steps ${SAVE_STEP}   --seed ${SEED}   --logging_first_step   --logging_steps $LOG_STEP    --learning_rate ${LR}   --warmup_steps ${WARM_UP_STEP}   --pad_to_max_length
In [6]:
! sh run_xla_train.sh 
WARNING:root:Waiting for TPU to be start up with version pytorch-1.7...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.7...
WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-03-03 18:39:42.179972: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-03-03 18:40:02.876519: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.026307: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.027292: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.034685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.169623: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.240399: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.311547: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:40:03.416882: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:run_plm:Process rank: -1, device: xla:1, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:28 - WARNING - run_plm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
WARNING:datasets.builder:Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:28 - WARNING - datasets.builder -   Using custom data configuration default-0d49a6dc5a2016e7
Downloading and preparing dataset text/default (download: Unknown size, generated: Unknown size, post-processed: Unknown size, total: Unknown size) to /root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57...
Dataset text downloaded and prepared to /root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57. Subsequent calls will reuse this data.
03/03/2021 18:40:29 - WARNING - datasets.builder -   Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
03/03/2021 18:40:29 - WARNING - run_plm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:29 - WARNING - datasets.builder -   Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:29 - WARNING - datasets.builder -   Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
03/03/2021 18:40:29 - WARNING - run_plm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Downloading: 100% 760/760 [00:00<00:00, 236kB/s]
[INFO|configuration_utils.py:449] 2021-03-03 18:40:29,949 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:485] 2021-03-03 18:40:29,951 >> Model config XLNetConfig {
  "architectures": [
    "XLNetLMHeadModel"
  ],
  "attn_type": "bi",
  "bi_data": false,
  "bos_token_id": 1,
  "clamp_len": -1,
  "d_head": 64,
  "d_inner": 3072,
  "d_model": 768,
  "dropout": 0.1,
  "end_n_top": 5,
  "eos_token_id": 2,
  "ff_activation": "gelu",
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-12,
  "mem_len": null,
  "model_type": "xlnet",
  "n_head": 12,
  "n_layer": 12,
  "pad_token_id": 5,
  "reuse_len": null,
  "same_length": false,
  "start_n_top": 5,
  "summary_activation": "tanh",
  "summary_last_dropout": 0.1,
  "summary_type": "last",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 250
    }
  },
  "transformers_version": "4.3.3",
  "untie_r": true,
  "use_mems_eval": true,
  "use_mems_train": false,
  "vocab_size": 32000
}

03/03/2021 18:40:30 - WARNING - run_plm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:30 - WARNING - datasets.builder -   Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder -   Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
[INFO|configuration_utils.py:449] 2021-03-03 18:40:30,061 >> loading configuration file https://huggingface.co/xlnet-base-cased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/06bdb0f5882dbb833618c81c3b4c996a0c79422fa2c95ffea3827f92fc2dba6b.da982e2e596ec73828dbae86525a1870e513bd63aae5a2dc773ccc840ac5c346
[INFO|configuration_utils.py:485] 2021-03-03 18:40:30,062 >> Model config XLNetConfig {
  "architectures": [
    "XLNetLMHeadModel"
  ],
  "attn_type": "bi",
  "bi_data": false,
  "bos_token_id": 1,
  "clamp_len": -1,
  "d_head": 64,
  "d_inner": 3072,
  "d_model": 768,
  "dropout": 0.1,
  "end_n_top": 5,
  "eos_token_id": 2,
  "ff_activation": "gelu",
  "initializer_range": 0.02,
  "layer_norm_eps": 1e-12,
  "mem_len": null,
  "model_type": "xlnet",
  "n_head": 12,
  "n_layer": 12,
  "pad_token_id": 5,
  "reuse_len": null,
  "same_length": false,
  "start_n_top": 5,
  "summary_activation": "tanh",
  "summary_last_dropout": 0.1,
  "summary_type": "last",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_sample": true,
      "max_length": 250
    }
  },
  "transformers_version": "4.3.3",
  "untie_r": true,
  "use_mems_eval": true,
  "use_mems_train": false,
  "vocab_size": 32000
}

03/03/2021 18:40:30 - WARNING - run_plm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
03/03/2021 18:40:30 - WARNING - datasets.builder -   Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder -   Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading:   0% 0.00/798k [00:00<?, ?B/s]03/03/2021 18:40:30 - WARNING - datasets.builder -   Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder -   Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading: 100% 798k/798k [00:00<00:00, 5.38MB/s]
03/03/2021 18:40:30 - WARNING - run_plm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
Downloading:   0% 0.00/1.38M [00:00<?, ?B/s]03/03/2021 18:40:30 - WARNING - datasets.builder -   Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:30 - WARNING - datasets.builder -   Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading: 100% 1.38M/1.38M [00:00<00:00, 7.47MB/s]
[INFO|tokenization_utils_base.py:1786] 2021-03-03 18:40:30,898 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/spiece.model from cache at /root/.cache/huggingface/transformers/df73bc9f8d13bf2ea4dab95624895e45a550a0f0a825e41fc25440bf367ee3c8.d93497120e3a865e2970f26abdf7bf375896f97fde8b874b70909592a6c785c9
[INFO|tokenization_utils_base.py:1786] 2021-03-03 18:40:30,898 >> loading file https://huggingface.co/xlnet-base-cased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/46f47734f3dcaef7e236b9a3e887f27814e18836a8db7e6a49148000058a1a54.2a683f915238b4f560dab0c724066cf0a7de9a851e96b0fb3a1e7f0881552f53
03/03/2021 18:40:31 - WARNING - run_plm -   Process rank: -1, device: xla:0, n_gpu: 0distributed training: False, 16-bits training: False
[INFO|file_utils.py:1302] 2021-03-03 18:40:31,276 >> https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp4xu840rv
03/03/2021 18:40:31 - WARNING - datasets.builder -   Using custom data configuration default-0d49a6dc5a2016e7
03/03/2021 18:40:31 - WARNING - datasets.builder -   Reusing dataset text (/root/.cache/huggingface/datasets/text/default-0d49a6dc5a2016e7/0.0.0/293ecb642f9fca45b44ad1f90c8445c54b9d80b95ab3fca3cfa5e1e3d85d4a57)
Downloading: 100% 467M/467M [00:09<00:00, 51.4MB/s]
[INFO|file_utils.py:1306] 2021-03-03 18:40:40,731 >> storing https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
[INFO|file_utils.py:1309] 2021-03-03 18:40:40,731 >> creating metadata file for /root/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
[INFO|modeling_utils.py:1027] 2021-03-03 18:40:40,732 >> loading weights file https://huggingface.co/xlnet-base-cased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/9461853998373b0b2f8ef8011a13b62a2c5f540b2c535ef3ea46ed8a062b16a9.3e214f11a50e9e03eb47535b58522fc3cc11ac67c120a9450f6276de151af987
  0% 0/9 [00:00<?, ?ba/s][INFO|modeling_utils.py:1143] 2021-03-03 18:41:15,062 >> All model checkpoint weights were used when initializing XLNetLMHeadModel.

[INFO|modeling_utils.py:1152] 2021-03-03 18:41:15,070 >> All the weights of XLNetLMHeadModel were initialized from the model checkpoint at xlnet-base-cased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use XLNetLMHeadModel for predictions without further training.
100% 9/9 [00:44<00:00,  4.97s/ba]
100% 9/9 [00:44<00:00,  4.97s/ba]
100% 9/9 [00:44<00:00,  4.96s/ba]
100% 9/9 [00:44<00:00,  4.95s/ba]
100% 9/9 [00:45<00:00,  5.03s/ba]
100% 9/9 [00:44<00:00,  4.99s/ba]
100% 9/9 [00:45<00:00,  5.00s/ba]
100% 9/9 [00:45<00:00,  5.02s/ba]
100% 9/9 [00:44<00:00,  4.92s/ba]
100% 9/9 [00:44<00:00,  4.95s/ba]
100% 9/9 [00:44<00:00,  4.93s/ba]
100% 9/9 [00:44<00:00,  4.97s/ba]
100% 9/9 [00:44<00:00,  4.95s/ba]
100% 9/9 [00:44<00:00,  4.99s/ba]
100% 9/9 [00:44<00:00,  4.99s/ba]
100% 9/9 [00:45<00:00,  5.00s/ba]
100% 9/9 [02:45<00:00, 18.43s/ba]
100% 9/9 [02:53<00:00, 19.30s/ba]
100% 9/9 [02:56<00:00, 19.63s/ba]
100% 9/9 [03:04<00:00, 20.47s/ba]
100% 9/9 [03:10<00:00, 21.18s/ba]
100% 9/9 [03:10<00:00, 21.15s/ba]
100% 9/9 [03:12<00:00, 21.34s/ba]
100% 9/9 [03:54<00:00, 26.02s/ba]
100% 9/9 [01:39<00:00, 11.11s/ba]
 78% 7/9 [02:12<00:38, 19.32s/ba]/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
 78% 7/9 [01:56<00:35, 17.85s/ba]WARNING:root:TPU has started up successfully with version pytorch-1.7
100% 9/9 [02:19<00:00, 15.46s/ba]
100% 9/9 [02:12<00:00, 14.73s/ba]
100% 9/9 [02:31<00:00, 16.82s/ba]
2021-03-03 18:48:05.195537: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
100% 9/9 [02:30<00:00, 16.69s/ba]
 44% 4/9 [01:33<02:00, 24.18s/ba]/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100% 9/9 [02:22<00:00, 15.87s/ba]
[INFO|trainer.py:432] 2021-03-03 18:48:19,351 >> The following columns in the training set don't have a corresponding argument in `XLNetLMHeadModel.forward` and have been ignored: .
[INFO|trainer.py:432] 2021-03-03 18:48:19,356 >> The following columns in the evaluation set don't have a corresponding argument in `XLNetLMHeadModel.forward` and have been ignored: .
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
WARNING:root:TPU has started up successfully with version pytorch-1.7
[INFO|trainer.py:837] 2021-03-03 18:48:19,404 >> ***** Running training *****
[INFO|trainer.py:838] 2021-03-03 18:48:19,404 >>   Num examples = 2918
[INFO|trainer.py:839] 2021-03-03 18:48:19,404 >>   Num Epochs = 10
[INFO|trainer.py:840] 2021-03-03 18:48:19,404 >>   Instantaneous batch size per device = 2
[INFO|trainer.py:841] 2021-03-03 18:48:19,404 >>   Total train batch size (w. parallel, distributed & accumulation) = 32
[INFO|trainer.py:842] 2021-03-03 18:48:19,405 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:843] 2021-03-03 18:48:19,405 >>   Total optimization steps = 910
  0% 0/910 [00:00<?, ?it/s]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
 89% 8/9 [02:27<00:17, 17.47s/ba]/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
100% 9/9 [02:28<00:00, 16.52s/ba]
 56% 5/9 [01:46<01:23, 20.76s/ba]WARNING:root:TPU has started up successfully with version pytorch-1.7
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
2021-03-03 18:48:32.196680: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:root:TPU has started up successfully with version pytorch-1.7
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
 67% 6/9 [01:55<00:51, 17.31s/ba]2021-03-03 18:48:35.876819: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:48:36.002518: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:48:41.180115: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
 78% 7/9 [02:07<00:31, 15.52s/ba]2021-03-03 18:48:47.832125: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-03-03 18:48:51.242607: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
100% 9/9 [02:14<00:00, 14.99s/ba]
/usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead.
  FutureWarning,
  0% 1/910 [00:43<10:58:35, 43.47s/it]huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
WARNING:root:TPU has started up successfully with version pytorch-1.7
2021-03-03 18:49:06.879620: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
{'loss': 5.5622, 'learning_rate': 5e-06, 'epoch': 0.01}
{'loss': 5.9572, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 4.9327, 'learning_rate': 1.5e-05, 'epoch': 0.03}
{'loss': 4.0891, 'learning_rate': 2e-05, 'epoch': 0.04}
{'loss': 3.6777, 'learning_rate': 2.5e-05, 'epoch': 0.05}
{'loss': 4.2414, 'learning_rate': 3e-05, 'epoch': 0.07}
{'loss': 4.0288, 'learning_rate': 3.5e-05, 'epoch': 0.08}
{'loss': 3.2993, 'learning_rate': 4e-05, 'epoch': 0.09}
{'loss': 3.4531, 'learning_rate': 4.5e-05, 'epoch': 0.1}
{'loss': 3.5838, 'learning_rate': 5e-05, 'epoch': 0.11}
{'loss': 3.2614, 'learning_rate': 4.994444444444445e-05, 'epoch': 0.12}
{'loss': 2.8368, 'learning_rate': 4.9888888888888894e-05, 'epoch': 0.13}
{'loss': 3.3508, 'learning_rate': 4.9833333333333336e-05, 'epoch': 0.14}
{'loss': 3.6209, 'learning_rate': 4.977777777777778e-05, 'epoch': 0.15}
  2% 14/910 [06:28<1:27:10,  5.84s/it]Traceback (most recent call last):
  File "/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py", line 85, in <module>
    main()
  File "/content/drive/MyDrive/colab_data/pretrained_models/bioXLNet/transformers-4.2.2/examples/xla_spawn.py", line 81, in main
    xmp.spawn(mod._mp_fn, args=(), nprocs=args.num_cores)
  File "/usr/local/lib/python3.7/dist-packages/torch_xla/distributed/xla_multiprocessing.py", line 395, in spawn
    start_method=start_method)
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 77, in join
    timeout=timeout,
  File "/usr/lib/python3.7/multiprocessing/connection.py", line 921, in wait
    ready = selector.select(timeout)
  File "/usr/lib/python3.7/selectors.py", line 415, in select
    fd_event_list = self._selector.poll(timeout)
KeyboardInterrupt
  2% 14/910 [06:29<6:55:47, 27.84s/it]
^C